CodeSum: Translate Program Language to Natural Language

نویسندگان

  • Xing Hu
  • Yuhan Wei
  • Ge Li
  • Zhi Jin
چکیده

During software maintenance, programmers spend a lot of time on code comprehension. Reading comments is an effective way for programmers to reduce the reading and navigating time when comprehending source code. Therefore, as a critical task in software engineering, code summarization aims to generate brief natural language descriptions for source code. In this paper, we propose a new code summarization model named CodeSum. CodeSum exploits the attention-based sequence-to-sequence (Seq2Seq) neural network with Structure-based Traversal (SBT) of Abstract Syntax Trees (AST). The AST sequences generated by SBT can better present the structure of ASTs and keep unambiguous. We conduct experiments on three large-scale corpora in different program languages, i.e., Java, C#, and SQL, in which Java corpus is our new proposed industry code extracted from Github. Experimental results show that our method CodeSum outperforms the state-of-the-art significantly. Introduction Source code summarization is the task of creating readable natural language summaries that describe the functionality of software. It is important in the field of source code comprehension. During the software maintenance, programmers spend a lot of time reading and understanding the source code snippets to comprehend them. Studies of program comprehension indicate that programmers often read a summary which is a comment describing the function of the code (e.g., JavaDoc1 descriptions for Java methods) or skim source code (e.g., read important keywords) to save time (Rodeghero et al. 2014; Sim, Clarke, and Holt 1998). For example, Figure 1 shows a Java method named toIndexName extracted from Github2. Through the summary and name of the method, developers can easily understand the method aiming to “convert the index of an attacker into a readable name in a battle”. However, these summaries are sometimes missing, incomplete or outdated. Therefore, automated source code summarization becomes an emerging technology in software engineering. Predicting these source code summarizations can be used in improving code search by natural language queries, code comprehension, and code

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CS224N Final Project Writeup EMTAIL: Eric-Mike Tree Associations in Language

The perfect natural language program would be able to translate natural language sentences into modal or first order propositions, keep a knowledge and belief database, and generate relevant natural language sentences about the state of the world (from the information in its knowledge base). However, we don’t have a magical sentence-to-logical-proposition converter, and it seemed like that woul...

متن کامل

The Relationship between First and Second Language Literacy in Writing

This paper explores the ways in which the transfer of assumptions from first language (L1) writing can help the process of writing in second language (L2). In learning second language writing skills, learners have two primary sources from which they construct a second language system: knowledge and skills from first language and input from second language. To investigate the relative impact of ...

متن کامل

An Investigation on the Relationship between the Grammatical Competence of Young Iranian English Translation Students and their Ability to Translate from English to Farsi

     Today, everything has changed and this has brought a need for learning a second language. Most countries across the world use English as their second/foreign language and the fundamental part of this process is grammar, i.e., the combination of sound, structure, and meaning system of language. A sentence can be composed of several words, clauses, as well as grammatical rules. These grammat...

متن کامل

Fundamental Reform Document of Education and ELT Program: The Investigation of Language Teachers’ Perspectives

: The purpose of the current attitudinal study is to investigate the attitudes and opinions of language teachers toward the implemented ELT program resulted from the Fundamental Reform Document of Education in the Iranian Ministry of Education. Three items were investigated: Teacher’s Practice, Teacher Training Courses, and Materials. Following the rigorous and systematic proce...

متن کامل

Music Training Program: A Method Based on Language Development and Principles of Neuroscience to Optimize Speech and Language Skills in Hearing-Impaired Children

Introduction: In recent years, music has been employed in many intervention and rehabilitation program to enhance cognitive abilities in patients. Numerous researches show that music therapy can help improving language skills in patients including hearing impaired. In this study, a new method of music training is introduced based on principles of neuroscience and capabilities of Persian languag...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1708.01837  شماره 

صفحات  -

تاریخ انتشار 2017